Geometry and Determinism of Optimal Stationary Control in Partially Observable Markov Decision Processes

نویسندگان

Guido Montúfar

Keyan Zahedi

Nihat Ay

چکیده

It is well known that any finite state Markov decision process (MDP) has a deterministic memoryless policy that maximizes the discounted longterm expected reward. Hence for such MDPs the optimal control problem can be solved over the set of memoryless deterministic policies. In the case of partially observable Markov decision processes (POMDPs), where there is uncertainty about the world state, optimal policies must generally be stochastic, if no additional information is presented, e.g., the observation history. In the context of embodied artificial intelligence and systems design an agent’s policy underlies hard physical constraints and must be as efficient as possible. Having this in mind, we focus on memoryless POMDPs. We cast the optimization problem as a constrained linear optimization problem and develop a corresponding geometric framework. We show that any POMDP has an optimal memoryless policy of limited stochasticity, which means that we can give an upper bound to the number of deterministic policies that need to mixed to obtain an optimal stationary policy, regardless of the specific reward function.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A POMDP Framework to Find Optimal Inspection and Maintenance Policies via Availability and Profit Maximization for Manufacturing Systems

Maintenance can be the factor of either increasing or decreasing system's availability, so it is valuable work to evaluate a maintenance policy from cost and availability point of view, simultaneously and according to decision maker's priorities. This study proposes a Partially Observable Markov Decision Process (POMDP) framework for a partially observable and stochastically deteriorating syste...

متن کامل

Optimal Control for Partially Observable Markov Decision Processes over an Infinite Horizon

In this paper we consider an optimal control problem for partially observable Markov decision processes with finite states, signals and actions OVE,r an infinite horizon. It is shown that there are €optimal piecewise·linear value functions and piecl~wise-constant policies which are simple. Simple means that there are only finitely many pieces, each of which is defined on a convex polyhedral set...

متن کامل

Learning in non-stationary Partially Observable Markov Decision Processes

We study the problem of finding an optimal policy for a Partially Observable Markov Decision Process (POMDP) when the model is not perfectly known and may change over time. We present the algorithm MEDUSA+, which incrementally improves a POMDP model using selected queries, while still optimizing the reward. Empirical results show the response of the algorithm to changes in the parameters of a m...

متن کامل

Good Policies for Partially-observable Markov Decision Processes Are Hard to Nd

Optimal policy computation in nite-horizon Markov decision processes is a classical problem in optimization with lots of pratical applications. For stationary policies and innnite horizon it is known to be solvable in polynomial time by linear programming, whereas for nite-horizon it is a longstanding open problem. We consider this problem for a slightly generalized model, namely partially-obse...

متن کامل

Optimal control of infinite horizon partially observable decision processes modelled as generators of probabilistic regular languages

Decision processes with incomplete state feedback have been traditionally modelled as partially observable Markov decision processes. In this article, we present an alternative formulation based on probabilistic regular languages. The proposed approach generalises the recently reported work on language measure theoretic optimal control for perfectly observable situations and shows that such a f...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1503.07206 شماره

صفحات -

تاریخ انتشار 2015

Geometry and Determinism of Optimal Stationary Control in Partially Observable Markov Decision Processes

نویسندگان

چکیده

منابع مشابه

A POMDP Framework to Find Optimal Inspection and Maintenance Policies via Availability and Profit Maximization for Manufacturing Systems

Optimal Control for Partially Observable Markov Decision Processes over an Infinite Horizon

Learning in non-stationary Partially Observable Markov Decision Processes

Good Policies for Partially-observable Markov Decision Processes Are Hard to Nd

Optimal control of infinite horizon partially observable decision processes modelled as generators of probabilistic regular languages

عنوان ژورنال:

اشتراک گذاری